---
title: "MetaLlamaChatGenerator"
id: metallamachatgenerator
slug: "/metallamachatgenerator"
description: "This component enables chat completion with any model hosted available with Meta Llama API."
---

# MetaLlamaChatGenerator

This component enables chat completion with any model hosted available with Meta Llama API.

|                                        |                                                                                                             |
| -------------------------------------- | ----------------------------------------------------------------------------------------------------------- |
| **Most common position in a pipeline** | After a [ChatPromptBuilder](../builders/chatpromptbuilder.mdx)                                                          |
| **Mandatory init variables**           | “api_key”: A Meta Llama API key. Can be set with `LLAMA_API_KEY` env variable or passed to `init()` method. |
| **Mandatory run variables**            | “messages:” A list of [ChatMessage](../../concepts/data-classes/chatmessage.mdx) objects                                                |
| **Output variables**                   | “replies”: A list of [ChatMessage](../../concepts/data-classes/chatmessage.mdx) objects                                                 |
| **API reference**                      | [Meta Llama API](/reference/integrations-meta-llama)                                                               |
| **GitHub link**                        | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/meta_llama                |

## Overview

The `MetaLlamaChatGenerator` enables you to use multiple Meta Llama models by making chat completion calls to the Meta [Llama API](https://llama.developer.meta.com/?utm_source=partner-haystack&utm_medium=website). The default model is `Llama-4-Scout-17B-16E-Instruct-FP8`.

Currently available models are:

| Model ID                                 | Input context length | Output context length | Input Modalities | Output Modalities |
| ---------------------------------------- | -------------------- | --------------------- | ---------------- | ----------------- |
| `Llama-4-Scout-17B-16E-Instruct-FP8`     | 128k                 | 4028                  | Text, Image      | Text              |
| `Llama-4-Maverick-17B-128E-Instruct-FP8` | 128k                 | 4028                  | Text, Image      | Text              |
| `Llama-3.3-70B-Instruct`                 | 128k                 | 4028                  | Text             | Text              |
| `Llama-3.3-8B-Instruct`                  | 128k                 | 4028                  | Text             | Text              |

This component uses the same `ChatMessage` format as other Haystack Chat Generators for structured input and output. For more information, see the [ChatMessage documentation](../../concepts/data-classes/chatmessage.mdx).

It is also fully compatible with Haystack [Tools](../../tools/tool.mdx) and [Toolsets](../../tools/toolset.mdx) that allow function-calling capabilities with supported models.

### Initialization

To use this integration, you must have a Meta Llama API key. You can provide it with the `LLAMA_API_KEY` environment variable or by using a [Secret](../../concepts/secret-management.mdx).

Then, install the `meta-llama-haystack` integration:

```shell
pip install meta-llama-haystack
```

### Streaming

`MetaLlamaChatGenerator` supports [streaming](guides-to-generators/choosing-the-right-generator.mdx#streaming-support) responses from the LLM, allowing tokens to be emitted as they are generated. To enable streaming, pass a callable to the `streaming_callback` parameter during initialization.

## Usage

### On its own

```python
from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.meta_llama import MetaLlamaChatGenerator

llm = MetaLlamaChatGenerator()
response = llm.run(
    [ChatMessage.from_user("What are Agentic Pipelines? Be brief.")]
)
print(response["replies"][0].text)

```

With streaming and model routing:

```python
from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.meta_llama import MetaLlamaChatGenerator

llm = MetaLlamaChatGenerator(model="Llama-3.3-8B-Instruct",
streaming_callback=lambda chunk: print(chunk.content, end="", flush=True))

response = llm.run(
    [ChatMessage.from_user("What are Agentic Pipelines? Be brief.")]
    )

## check the model used for the response
print("\n\n Model used: ", response["replies"][0].meta["model"])
```

### In a pipeline

```python
## To run this example, you will need to set a `LLAMA_API_KEY` environment variable.

from haystack import Document, Pipeline
from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder
from haystack.components.generators.utils import print_streaming_chunk
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.dataclasses import ChatMessage
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.utils import Secret

from haystack_integrations.components.generators.meta_llama import MetaLlamaChatGenerator

## Write documents to InMemoryDocumentStore
document_store = InMemoryDocumentStore()
document_store.write_documents(
    [
        Document(content="My name is Jean and I live in Paris."),
        Document(content="My name is Mark and I live in Berlin."),
        Document(content="My name is Giorgio and I live in Rome."),
    ]
)

## Build a RAG pipeline
prompt_template = [
    ChatMessage.from_user(
        "Given these documents, answer the question.\n"
        "Documents:\n{% for doc in documents %}{{ doc.content }}{% endfor %}\n"
        "Question: {{question}}\n"
        "Answer:"
    )
]

## Define required variables explicitly
prompt_builder = ChatPromptBuilder(template=prompt_template, required_variables={"question", "documents"})

retriever = InMemoryBM25Retriever(document_store=document_store)
llm = MetaLlamaChatGenerator(
    api_key=Secret.from_env_var("LLAMA_API_KEY"),
    streaming_callback=print_streaming_chunk,
)

rag_pipeline = Pipeline()
rag_pipeline.add_component("retriever", retriever)
rag_pipeline.add_component("prompt_builder", prompt_builder)
rag_pipeline.add_component("llm", llm)
rag_pipeline.connect("retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "llm.messages")

## Ask a question
question = "Who lives in Paris?"
rag_pipeline.run(
    {
        "retriever": {"query": question},
        "prompt_builder": {"question": question},
    }
)
```
